NTTMU-SCHEMA BeCalm API in BioCreative V.5
نویسندگان
چکیده
With the emerging of new experimental techniques, there has been a remarkable increase in the amount of available biomedical data. Processing and mining large volumes of data in chemistry has now presented a challenging issue. In order to deal with the challenge, we developed SCHEMA (Spark-based CHEMicAl entity recognizer), a robust and efficient chemical entity recognition system on top of Apache Spark. SCHEMA is developed by following the asynchronous queue design pattern, which has been employed in service-oriented architecture for providing scalable and resilient services. SCHEMA that can retrieve patents in a form of unstructured free text from different websites and recognize chemical named entities described in them. To programmatically interact with SCHEMA, a restful Web application programming interface is provided. By using the custom request tests of the BeCalm (Biomedical annotation metaserver) platform, the test results illustrated that SCHEMA can process 5,000 patients within 5 minutes, indicating an average of only 0.06 second for processing one patent including the data fetch and analysis time.
منابع مشابه
Tagger: BeCalm API for rapid named entity recognition
Most BioCreative tasks to date have focused on assessing the quality of text-mining annotations in terms of precision of recall. Interoperability, speed, and stability are, however, other important factors to consider for practical applications of text mining. The new BioCreative/BeCalm TIPS task focuses purely on these. To participate in this task, I implemented a BeCalm API within the real-ti...
متن کاملMicro-RNA Recognition in Patents in BioCreative V.5
MicroRNAs (miRNAs) have been considered as good candidates for early detection or prognosis biomarkers for various diseases. Patents related to methods of identifying, isolating and amplifying miRNAs and potential use of miRNAs as biomarkers for cancers are increasing rapidly. In this work, we extend our miRNA recognition method based on the statistical principle-based approach and develop a we...
متن کاملDUTIR at the BioCreative V.5.BeCalm Tasks: A BLSTM-CRF Approach for Biomedical Entity Recognition in Patents
Patents contain the significant amount of information. Biomedical text mining has received much attention in patents recently, especially in the medicinal chemistry domain. The BioCreative V.5.BeCalm tasks focus on biomedical entities recognition in patents. This paper describes our method used to create our submissions to the Chemical Entity Mention recognition (CEMP) and Gene and Protein Rela...
متن کاملCRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools
This paper relates to the two offline BioCreative V.5 Becalm tasks. The first challenge is CEMP, the recognition of chemical named entity mentions. The second challenge is GPRO, the recognition of gene and protein related objects in running text. We focus on training and optimizing state-of-the-art solutions for named entity tagging for CEMP and GPRO. Finally, we present CRFVoter, a two staged ...
متن کاملA hybrid text mining system for chemical entity recognition and classification using dictionary look-up and pattern matching @ BeCalm challenge evaluation workshop
Chemicals as therapeutics and investigational agents receive much attention in clinical research and applications recently. However, automated approaches to recognize and categorize the chemical entities in biomedical text are challenging because of the wide varieties of morphologies and nomenclature. We present here a hybrid text mining system that combines chemical lexicon and patterns for re...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017